NTCIR-5 WEB Navi-2 Experiments at Osaka Kyoiku University - Page, Anchor and Title Indexing, and In-link Count, Inter Page and Inter Site Link Analyses

نویسندگان

  • Takashi Sato
  • Hitoshi Nakakubo
چکیده

This paper describes experimental results of WEB Navigational Retrieval Subtask 2 (WEB Navi-2). We made three gram-based indices, namely indices for text in whole page, text in title tag and text in anchor tag. Since gram-based indices are able to index all strings in target text, words that are not found in dictionaries are also indexed essentially. We used words in TITLE tag of search topics as queries. We did three kinds of link analyses, that is, in-link count and inter site and inter page link analysis. We merged score from word search for three indices and score from link analyses variously. We found that anchor text analysis was most effective for WEB Navi-2, and that it is necessary to devise merging of page and/or title score to anchor score.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

NTCIR-4 WEB Experiments at Osaka Kyoiku University - Static/Dynamic Scoring Using Link Structure Analysis and Web Page Grouping

We did gram-based indexing and the retrieval with NTCIR-4 WEB task. The time required to make indices are 34.7 hours. The size of indices is 30.2Gbyte. The median of retrieval time par word is 26msec. The ranking algorithm of retrieval results is based on a traditional probabilistic model. We report on the result of gram-based indexing and the retrieval, and propose a scoring method based on li...

متن کامل

Exploiting Anchor Text for the Navigational Web Retrieval at NTCIR-5

In the Navigational Retrieval Subtask 2 (Navi-2) at the NTCIR-5 WEB Task, a hypothetical user knows a specific item (e.g., a product, company, and person) and requires to find one or more representative Web pages related to the item. This paper describes our system participated in the Navi-2 subtask and reports the evaluation results of our system. Our system uses three types of information obt...

متن کامل

Osaka Kyoiku University at NTCIR-10 CrossLink-2: Link Filtering by Title Tag of Corpus as a Dictionary

Our group (OKSAT) submitted two types of runs named SMP and REF for every subtasks of NTCIR-10 Cross-lingual Link Discovery (CLLD). Our method uses titles in Wikipedia pages (corpus) of source language as a entries of a dictionary, so no external dictionary is required. For SMP, we aimed to discover cross-lingual links of actual Wikipedia, in other words it targets Wikipedia ground truth. For R...

متن کامل

NTCIR-3 WEB Experiments at Osaka Kyoiku University - Towards Index Partitioning and Parallel Retrieval

Long gram-based indices are experimented at NTCIR-3 WEB task. To make gram-based indices, no analyses such as morphological ones are required. 2 byte characters extracted from NTCIR-3 ‘cooked’ version of WEB task corpus. The total index size is 26 Gbyte and time to make indices is about 18 hours. Median search time per word from index is 197msec. Ranking algorithm used is based on a traditional...

متن کامل

Verification of Effective Retrieval Method for Anchor Text on Navigational Retrieval

We participated in NTCIR-5 WEB Navigational Retrieval Subtask(Navi-2) in order to verify the most effective retrieval method for the index of anchor texts by using a retrieval system that indexed only anchor texts instead of full texts of Web pages. We introduced retrieval methods that combine one or more of six retrieval measures: (a) anchor frequency (af), (b) reference consistency (rc), (c) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005